AITopics | face frame

Collaborating Authors

face frame

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Towards Expressive Video Dubbing with Multiscale Multimodal Context Interaction

Zhao, Yuan, Liu, Rui, Cong, Gaoxiang

arXiv.org Artificial IntelligenceDec-31-2024

Automatic Video Dubbing (AVD) generates speech aligned with lip motion and facial emotion from scripts. Recent research focuses on modeling multimodal context to enhance prosody expressiveness but overlooks two key issues: 1) Multiscale prosody expression attributes in the context influence the current sentence's prosody. 2) Prosody cues in context interact with the current sentence, impacting the final prosody expressiveness. To tackle these challenges, we propose M2CI-Dubber, a Multiscale Multimodal Context Interaction scheme for AVD. This scheme includes two shared M2CI encoders to model the multiscale multimodal context and facilitate its deep interaction with the current sentence. By extracting global and local features for each modality in the context, utilizing attention-based mechanisms for aggregation and interaction, and employing an interaction-based graph attention network for fusion, the proposed approach enhances the prosody expressiveness of synthesized speech for the current sentence. Experiments on the Chem dataset show our model outperforms baselines in dubbing expressiveness. The code and demos are available at \textcolor[rgb]{0.93,0.0,0.47}{https://github.com/AI-S2-Lab/M2CI-Dubber}.

current sentence, interaction, pre fol, (13 more...)

arXiv.org Artificial Intelligence

2412.18748

Country:

Asia > Mongolia (0.05)
Asia > China > Inner Mongolia > Hohhot (0.04)
Asia > Taiwan > Taiwan Province > Taipei (0.04)
Asia > China > Beijing > Beijing (0.04)

Genre:

Research Report > Experimental Study (0.46)
Research Report > New Finding (0.46)

Technology:

Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Speech (0.97)
Information Technology > Artificial Intelligence > Vision (0.70)

Add feedback

Continuous Speech Recognition using EEG and Video

Krishna, Gautam, Carnahan, Mason, Tran, Co, Tewfik, Ahmed H

arXiv.org Machine LearningDec-27-2019

--In this paper we investigate whether electroen-cephalography (EEG) features can be used to improve the performance of continuous visual speech recognition systems. We implemented a connectionist temporal classification (CTC) based end-to-end automatic speech recognition (ASR) model for performing recognition. Our results demonstrate that EEG features are helpful in enhancing the performance of continuous visual speech recognition systems. In recent years there has been lot of interesting work done in the fields of lip reading and audio visual speech recognition. In [1] authors demonstrated end-to-end sentence level lip reading and in [2] authors demonstrated deep learning based end-to- end audio visual speech recognition.

face frame, recognition, speech recognition, (9 more...)

arXiv.org Machine Learning

1912.0773

Country:

North America > United States > Texas > Travis County > Austin (0.15)
North America > United States > Texas > Mason County > Mason (0.04)

Genre: Research Report > New Finding (1.00)

Industry:

Health & Medicine > Therapeutic Area (0.69)
Health & Medicine > Diagnostic Medicine (0.47)

Technology:

Information Technology > Artificial Intelligence > Speech > Speech Recognition (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.88)

Add feedback